Non-uniform Instruction Scheduling

نویسندگان

  • Joseph J. Sharkey
  • Dmitry V. Ponomarev
چکیده

Dynamic instruction scheduling logic is one of the most critical and cycle-limiting structures in modern superscalar processors, and it is not easily pipelined without significant losses in performance. However, these performance losses are incurred only due to a small fraction of instructions, which are intolerant to the non-atomic scheduling. We first perform an empirical analysis of the instruction streams to determine which instructions actually require single cycle scheduling. We then propose a Non-Uniform Scheduler – a design that partitions the scheduling logic into two queues, each with dedicated wakeup and selection logic: a small Fast Issue Queue (FIQ) to issue critical instructions in the back-to-back cycles and a large Slow Issue Queue (SIQ) to issue the remaining instructions over two cycles with a one cycle bubble between dependent instructions. Finally, we propose and evaluate several steering mechanisms to effectively distribute instructions between the queues.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Guaranteeing Forward Progress of Unified Register Allocation and Instruction Scheduling

Increasingly demanding computation requirements and tighter energy constraints have motivated distributed and/or hierarchical register file (dhrf) organizations as a mean to efficiently sustain a sufficient alu utilization in processors targeting embedded applications with many alus. Compared to conventional centralized register file organizations, dhrfs lead to tighter coupling between registe...

متن کامل

User-level scheduling on NUMA multicore systems under Linux

The problem of scheduling on multicore systems remains one of the hottest and the most challenging topics in systems research. Introduction of non-uniform memory access (NUMA) multicore architectures further complicates this problem, as on NUMA systems the scheduler needs not only consider the placement of threads on cores, but also the placement of memory. Hardware performance counters and har...

متن کامل

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

The performance of statically scheduled VLIW processors is highly sensitive to the instruction scheduling performed by the compiler. In this work we identify a major deficiency in existing instruction scheduling for VLIW processors. Unlike most dynamically scheduled processors, a VLIW processor with no load-use hardware interlocks will completely stall upon a cache-miss of any of the operations...

متن کامل

An effective and efficient code generation algorithm for uniform loops on non-orthogonal DSP architecture

To meet ever-increasing demands for higher performance and lower power consumption, many high-end digital signal processors (DSPs) commonly employ non-orthogonal architecture. This architecture typically is characterized by irregular data paths, heterogeneous registers, and multiple memory banks. Moreover, sufficient compiler support is obviously important to harvest its benefits. However, usua...

متن کامل

Register allocation sensitive region scheduling

Because of the interdependences between instruction scheduling and register allocation, it is not clear which of these two phases should run rst. In this paper, we describe how we modiied a global instruction scheduling technique to make it cooperate with a subsequent register allocation phase. In particular, our cooperative global instruction scheduler performs region scheduling transformation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005